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Scope 



The present document describes the key components of the DTS Coherent Acoustics technology. The document also 
includes the lists of all frame header parameters in the DTS core and extension (XCh and X96k) streams. The 
information about the remaining parameters of the DTS bit streams is further described in U.S. and other National 
patents which are listed in the Intellectual Property Rights clause of the present document, in connection with the 
intellectual property rights (IPRs) of DTS. These patents are published and are publicly available. 



2 References 

Void. 



3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

DTS Core Audio Stream: carries the coding parameters of up to 5.1 channels of the original LPCM audio at up to 24 
bits per sample with the sampling frequency of up to 48 kHz 

DTS Extended Audio Stream: delivers possible extended frequency bands of the primary audio channels as well as all 
frequency components of channels beyond 5.1. 

NOTE: The extended audio stream must always have the accompanying core stream. 

DTS XCh Stream: one of DTS extended streams that carries the coding parameters obtained from encoding of up to 2 
additional channels of original LPCM audio at up to 24 bits per sample with the sampling frequency of up to 48 kHz 

DTS X96k Stream: DTS extended audio stream that enables encoding of original LPCM audio at up to 24 bits per 
sample with the sampling frequency of up to 96 kHz 

NOTE: The stream carries the coding parameters used for the representation of all remaining audio components that 
are present in the original LPCM audio and are not represented in the core audio stream 

LPCM: Linear Pulse Code Modulated sequence of digital audio samples 

QMF bank: specific filtering structure that provides the means of translating the time domain signal into the multiple 
sub-band domain signals 

Vector Quantization: term for the joint quantization of a block of signal samples or a block of signal parameters 

3.2 Abbreviations 

For the purposes of the present document, the following abbreviations apply: 

DTS Digital Theatre Systems 

LFE Low Frequency Effect Channel 

LPCM Linear Pulse Code Modulation 

QMF Quadrature Mirror Filter 

VQ Vector Quantization 
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Summary 



DTS Coherent Acoustics is designed to deliver digital audio reproduction in the home at studio quality level in terms of 
fidelity and sound stage imagery. Specifically, it delivers up to eight discrete channels of multiplexed audio at sampling 
frequencies of 8 kHz to 192 kHz at bit rates of 32 kbit/s to 6 144 kbit/s. The encoding algorithm works at 24 bits per 
sample and can deliver compression rate of 3: 1 up to 40: 1 . 

Due to the popularity of the 5.1 channel sound tracks in the movie industry and in the emerging multichannel home 
audio market, DTS Coherent Acoustics is delivered in the form of a core audio (for the 5.1 channels) plus optional 
extended audio (for the rest of the DTS Coherent Acoustics). The 5.1 channel audio consists of up to five primary audio 
channels with frequencies lower than 24 kHz plus a possible low frequency effect (LFE) channel (the 0.1 channel). This 
implies that the frequency components higher than 24 kHz for the five primary audio channels and all frequency 
components of the remaining two channels are carried in the extended audio. This structure is illustrated in figure 4.1 
and as follows: 

• Core Audio: 

Up to 5 primary audio channels (frequency components below 24 kHz). 

Up to 1 low frequency effect (LFE) channel. 

Optional information such as time stamps and user information. 

• Extended Audio: 

Up to 2 additional full bandwidth channels (frequency components below 24 kHz). 

Frequency components above 24 kHz for the primary and extended audio channels. 

Under this structure, a basic DTS decoder can decode 5.1 channel core audio bits only and does not need to know even 
the existence of extended audio bits in the bit stream. A sophisticated decoder, however, can first decode the 5.1 core 
audio bits and then proceed to decode the extended audio bits if they exist. 
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Figure 4.1 : DTS Coherent Acoustics is optimized for 5.1 chiannel applications, but is extensible to 
deliver 8 channels with sampling frequency up to 192 kHz 



5 Core Audio 

DTS core encoder delivers 5. 1 channel audio at 24 bits per sample with a sampling frequency of up to 48 kHz. As 
shown in figure 5.1, the audio samples of a primary channel are split and decimated by a 32-band QMF bank into 32 
sub-bands. The samples of each sub-band goes through an adaptive prediction process to check if the resultant 
prediction gain is large enough to justify the overhead of transferring the coefficients of prediction filter. The prediction 
gain is obtained by comparing the variance of the prediction residual to that of the sub-band samples. If the prediction 
gain is big enough, the prediction residual is quantified using mid-tread scalar quantization and the prediction 
coefficients are vector-quantized (VQ). Otherwise, the sub-band samples themselves are quantized using mid-tread 
scalar quantization. In the case of low bit rate applications, the scalar quantization indexes of the residual or sub-band 
samples are further encoded using Huffman code. When the bit rate is low, vector quantization (VQ) may also be used 
to quantize samples of the high-frequency sub-bands for which the adaptive prediction is disabled. In very low bit rate 
applications, joint intensity coding and sum/difference coding may be employed to further improve audio quality. The 
optional LFE channel is compressed by: low-pass filtering, decimation and mid-tread scalar quantization. 
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Figure 5.1 : Compression of a primary audio channel. The dotted lines indicate 
optional operations and dash dot lines bit allocation control 

5.1 Frame structure and decoding procedure 

DTS bit stream is a sequence of synchronized frames, each consisting of the following fields (see figure 5.2): 

• Synchronization Word: Synchronize the decoder to the bit stream. 

• Frame Header: Carries information about frame construction, encoder configuration, audio data arrangement, 
and various operational features. 

• Sub-frames: Carries core audio data for the 5.1 channels. Each frame may have up to 16 sub-frames. 

• Optional Information: Carries auxiliary data such as time code, which is not intrinsic to the operation of the 
decoder but may be used for post processing routines. 

• Extended Audio: Candies possible extended frequency bands of the primary audio channels as well as all 
frequency components of channels beyond 5.1. 

Each sub-frame contains data for audio samples of the 5.1 channels covering a time duration of up to that of the sub- 
band analysis window and can be decoded entirely without reference to any other sub-frames. A sub-frame consists of 
the following fields (see figure 5.3): 

• Side Information: Relays information about how to decode the 5.1 channel audio data. Information for joint 
intensity coding is also included here. 

• High Frequency VQ: Some and a small number of high frequency sub-bands of the primary channels may be 
encoded using VQ. In this case, the samples of each of those sub-bands within the sub-frame are encoded as a 
single VQ address. 

• Low Frequency Effect Channel: The decimated samples of the LFE channel are carried as 8-bit words. 

• Sub-sub-frames: All sub-bands, except those high-frequency VQ encoded ones, are encoded here in up to 4 

sub-sub-frames. 
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Figure 5.2: DTS frame structure 
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Figure 5.3: Sub-frame structure 



5.2 



Error classification 



Each element in the bit stream carries either a piece of the audio data or the information to decode them. A corrupted bit 
stream element will cause an error in the decoder and its consequences depend on the information that element carries. 
In order to control decoded audio quality, the consequence of a corrupted element is categorized as 

V Vital: The element is designed to change from frame to frame and its corruption is likely to lead to failure in 
the decoding process and instability in decoded PCM outputs. 

ACC Corruption could cause failure. Since the element usually does not change from frame to frame, the error may 
be compensated for by a majority vote over consecutive frames. 

NV Non-vital: corruption will degrade the quality of PCM outputs, but the degradation will be graceful. 
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5.3 Synchronization 

DTS bit stream consists of a sequence of audio frames of equal size, each begins with a 32-bit synchronization word: 

SYNC = 0x7ffe8001 V 32 bits 

So the first decoding step is to search the input bit stream for SYNC. In order to reduce the probability of false 
synchronization, 6 bits after SYNC in the bit stream may be further checked, since they usually do not change for 
normal frames (they do carry useful information about frame structure). These 6 bits should be Ox3f (the binary 111111) 
for normal frames and are called synchronization word extension. Concatenating them with SYNC gives an extended 
synchronization word (32 + 6 = 38 bits): 

SYNC = 0x7ffe8001 + 0x3f for normal frame V 38 bits 

which reduces the probability of false synchronization to 10 .In addition, the fact that SYNC occurs at a fixed interval 
further reduces the probability of false synchronization to almost zero. 

The above search procedure shall be carried out only when the decoder is out of synchronization with the bit stream. 
After synchronization is established, the decoder should only check if SYNC = 0x7ffe8001 before it begins to decode a 
frame, because the 6 bits after SYNC may change for abnormal (termination) frames. 

The SYNC word appears at the beginning of each DTS data frame in the stream. The length of the DTS data frame is 
fixed for the entire DTS stream and consequently the SYNC words occur at the fixed intervals within the stream. 
During the initial synchronization process the decoder shall calculate the distance between the two consecutive SYNC 
words. While in synchronization with the incoming DTS stream, the decoder shall only look for the SYNC word of a 
new data frame at the calculated distance from the SYNC word of previously decoded data frame. If the SYNC word is 
found at the specified distance the decoder shall proceed with the decoding of the new data frame and if not the "out-of- 
sync" state shall be pronounced. 

When DTS bit stream is stored in 16-bit words such as on CD, SYNC will be stored as OxTffe and 0x8001. However, 
when DTS bit stream is viewed on an IBM PC platform, since the high byte and low byte are switched, SYNC will 
appear like 0xfe7f and x0180. 

Note that, in order to make the harsh sound less unpleasant when DTS bit stream is mistakenly played back as PCM 
format, DTS now provides a 14-bit format that reduces the dynamic range from 16 to 14 bits. In this 14-bit format, DTS 
bit stream is stored only in the least significant 14 bits of a 16-bit word, the most significant 2 bits are not used. In case 
of this, SYNC is stored in three words: Oxlfff, Oxe800, and 0x07f. 

5.4 Frame header 

The frame header consists of a bit stream header and a primary audio coding header. The bit stream header provides 
information about the construction of the frame, the encoder configuration such as core source sampling frequency, and 
various optional operational features such as embedded dynamic range control. The primary audio coding header 
specifies the packing arrangement and coding formats used at the encoder to assemble the audio coding side 
information. Many elements in the headers are repeated for each separate audio channel. 

5.4.1 Bit stream header 

Frame Type V FTYPE 1 bit 

It indicates the type of current frame: 

Table 5.1 : Frame Type 



FTYPE 


Frame Type 


1 


Normal frame 





Termination frame 
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Termination frames are used when it is necessary to accurately align the end of an audio sequence with a video frame 
end point. A termination block carries nx32 core audio samples where block length n is adjusted to just fall short of the 
video end point. Two termination frames may be transmitted sequentially to avoid transmitting one excessively small 
frame. 



Deficit Sample Count 



SHORT 



5 bits 



It defines the number of core samples by which a termination frame falls SHORT of the normal length of a block. A 
block = 32 PCM core samples per channel, corresponding to the number of PCM core samples that are feed to the core 
filter bank to generate one sub-band sample for each sub-band. A normal frame consists of blocks of 32 PCM core 
samples, while a termination frame provides the flexibility of having a frame size precision finer than the 32 PCM core 
sample block. On completion of a termination frame, (SHORTh-1) PCM core samples must be padded to the output 
buffers of each channel. The padded samples may be zeros or they may be copies of adjacent samples. 

Table 5.2: Deficit Sample Count 



SHORT 


Valid Value or Range of SHORT 


1 


[0,301 





31 (indicating a normal frame). 



CRC Present Flag V CPF 1 bit 

A flag that indicates if CRC (cyclic redundancy check) bits present in the bit stream. 

Table 5.3: CRC Present Flag 



CPF 


CRC 


1 


Present 





Not Present 



Number of PCM Sample Blocks 



NBLKS 



7 bits 



It indicates that there are (NBLKS + I) blocks (a block = 32 PCM core samples per channel, corresponding to the 
number of PCM samples that are fed to the core filter bank to generate one sub-band sample for each sub-band) in the 
current frame (see note). The actual core encoding window size is 32 x (NBLKS + 1) PCM samples per channel. Valid 
range for NBLKS: 5 to 127. Invalid range for NBLKS: to 4. For normal frames, this indicates a window size of either 
2 048, 1 024, 512, or 256 samples per channel. For termination frames, NBLKS can take any value in its valid range. 



NOTE: 



When frequency extension stream (X96k) is present, the PCM core samples represent the samples at the 
output of the decimator that precedes the core encoder. This k-times decimator translates the original 
PCM source samples with the sampling frequency of Fs_src = k x SFREQ to the core PCM samples 
(Fs_core = SFREQ) suitable for the encoding by the core encoder. The core encoder can handle sampling 
frequencies SFREQ < 48 kHz and consequently; 

- k = 2 for 48 kHz < Fsrc < 96 kHz and 



k = 4 for 96 kHz < Fsrc < 192 kHz 



Primary Frame Byte Size 



FSIZE 



14 bits 



(FSIZEh-1) is the total byte size of the current frame including primary audio data as well as any extension audio data. 
VaUd range for FSIZE: 95 to 16 383. Invalid range for FSIZE: to 94. 



Audio Channel Arrangement 



ACC 



AMODE 6 bits 



Audio channel arrangement that describes the number of audio channels (CHS) and the audio playback arrangement 
(see table 5.4). Unspecified modes may be defined at a later date (user defined code) and the control data required to 
implement them, i.e. channel assignments, down mixing etc, can be uploaded from the player platform. 
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Table 5.4: Audio channel arrangement 



AMODE 


CHS 


Arrangement 


ObOOOOOO 


1 


A 


ObOOOOOl 


2 


A + B (dual mono) 


ObOOOOlO 


2 


L + R (stereo) 


oboooon 


2 


(L + R) + (L - R) (sum - difference) 


ObOOOlOO 


2 


LT+ RT (left and right total) 


Ob000101 


3 


C + L+R 


obooono 


3 


L+R + S 


obooom 


4 


C+L+R+S 


ObOOIOOO 


4 


L+R + SL + SR 


Ob001001 


5 


C + L+R + SL + SR 


Ob001010 


6 


CL + CR + L+R + SL + SR 


Ob001011 


6 


C + L + R + LR + RR + OV 


ObOOHOO 


6 


CF + CR + LF+RF + LR + RR 


Ob001101 


7 


CL + C + CR + L+R + SL + SR 


oboomo 


8 


CL + CR + L + R + SL1 + SL2 + SR1 + SR2 


Ob001111 


8 


CL + C + CR + L+R + SL + S + SR 


0b010000-0b111111 




User defined 


Legends: L = left, R = right, C = center, S = surround, F = front, R = rear, T = total, OV = overhead 



Core Audio Sampling Frequency 



ACC SFREQ 



4 bits 



It specifies the sampling frequency of audio samples in the core encoder, based on table 5.5. When the source sampling 
frequency is beyond 48 kHz the audio is encoded in up to 3 separate frequency bands. The base-band audio, for 
example, kHz to 16 kHz, kHz to 22,05 kHz or kHz to 24 kHz, is encoded and packed into the core audio data 
arrays. The SFREQ corresponds to the sampling frequency of the base-band audio. The audio above the base-band (the 
extended bands), for example, 16 kHz to 32kHz, 22,05 kHz to 44,1 kHz, 24 kHz to 48 kHz, is encoded and packed into 
the extended coding arrays which reside at the end of the core audio data arrays. If the decoder is unable to make use of 
the high sample rate data this information may be ignored and the base-band audio converted normally using a standard 
sampling rates (32 kHz, 44,1 kHz or 48 kHz). If the decoder is receiving data coded at sampling rates lower than that 
available from the system then interpolation (2x or 4x) will be required (see table 5.6). 

Table 5.5: Core audio sampling frequencies 



SFREQ 


Core Audio Sampling Frequency 


ObOOOO 


Invalid 


ObOOOl 


8 kHz 


ObOOtO 


16 kHz 


Ob0011 


32 kHz 


ObOtOO 


Invalid 


Ob0101 


Invalid 


Ob0110 


11,025 kHz 


obom 


22,05 kHz 


ObtOOO 


44,1 kHz 


Ob1001 


Invalid 


Ob1010 


Invalid 


Ob1011 


12 kHz 


Ob1100 


24 kHz 


Ob1101 


48 kHz 


Ob1110 


Invalid 


Ob1111 


Invalid 



£75/ 



14 



ETSI TS 102 114 VI. 1.1 (2002-08) 



Table 5.6: Sub-sampled audio decoding for standard sampling rates 



Core Audio Sampling Frequency 


hHardware Sampling 
Frequency 


Required Filtering 


8 kHz 


32 kHz 


4 X Interpolation 


16 kHz 


32 kHz 


2 X Interpolation 


32 kHz 


32 kHz 


none 


11 kHz 


44,1 kHz 


4 X Interpolation 


22,05 kHz 


44,1 kHz 


2 X Interpolation 


44,1 kHz 


44,1 kHz 


none 


12 kHz 


48 kHz 


4 X Interpolation 


24 kHz 


48 kHz 


2 X Interpolation 


48 kHz 


48 kHz 


none 



Transmission Bit Rate 



ACC 



RATE 



5 bits 



RATE specifies the targeted transmission data rate for the current frame of audio (see table 5.7). The open mode allows 
for bit rates not defined by the table. Variable and loss-less modes imply that the data rate changes from frame to frame. 

Table 5.7: RATE parameter vs. targeted bit-rate 



RATE 


Targeted Bit Rate 
[kbit/s] 


ObOOOOO 


32 


ObOOOOl 


56 


ObOOOlO 


64 


ObOOOII 


96 


ObOOlOO 


112 


ObOOIOI 


128 


ObOOIIO 


192 


oboom 


224 


ObOIOOO 


256 


ObOIOOl 


320 


ObOIOlO 


384 


ObOIOII 


448 


Ob01100 


512 


Ob01101 


576 


obomo 


640 


obomi 


768 


OblOOOO 


960 


OblOOOl 


1 024 


OblOOlO 


1 152 


OblOOII 


1 280 


Ob10100 


1 344 


Ob10101 


1 408 


Ob10110 


1 411,2 


Ob10111 


1 472 


Ob11000 


1 536 


Ob11001 


1 920 


Ob11010 


2 048 


Ob11011 


3 072 


obmoo 


3 840 


Ob11101 


open 


Ob11110 


Variable 


Ob11111 


Loss-less 
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Due to the limitations of the transmission medium the actual bit rate may be slightly different from the targeted bit rate, 
as listed in table 5.8 for the two types of applications. The bit-rates that are not shown in the table 5.8 are not applicable 
on either of these two applications. 

Table 5.8: Targeted and actual bit-rate for the CD and DVD-Video applications 



RATE 


Targeted Bit Rate 
[kbit/s] 


Actual Bit Rate on 

DTS CDs 

[l<bit/s] 


Actual Bit Rate on 

DVD-Video Discs 

[kbit/s] 


14-bit 
format 


16-bit 
format 


obomi 


768 


N/A 


N/A 


754,50 


Ob10110 


1 411,2 


1 234,8 


1 411,2 


N/A 


Ob11000 


1 536 


N/A 


N/A 


1 509,75 



Embedded Down Mix Enabled 



MIX 



Ibit 



This indicates if embedded down mixing coefficients are included at the start of each sub-frame (see table 5.9). Down 
mixing to stereo may be implemented using these coefficients for the duration of the sub-frame. 

Table 5.9: Status of embedded down mixing coefficients 



MIX 


Mix Parameters 





not present 


1 


present 



Embedded Dynamic Range Flag 



DYNE 



Ibit 



DYNF indicates if embedded dynamic range coefficients are included at the start of each sub-frame. Dynamic range 
correction may be implemented on all channels using these coefficients for the duration of the sub-frame. 

Table 5.10: Embedded Dynamic Range Flag 



DYNF 


Dynamic Range Coefficients 





not present 


1 


present 



Embedded Time Stamp Flag V TIMEF Ibit 

It indicates if embedded time stamps are included at the end of the core audio data. 

Table 5.11 : Embedded Time Stamp Flag 



TIMEF 


Time Stamps 





not present 


1 


present 



Auxiliary Data Flag V AUXF Ibit 

It indicates if auxiliary data bytes are appended at the end of the core audio data. 

Table 5.12: Auxiliary Data Flag 



AUXF 


Auxiliary Data Bytes 





not present 


1 


present 
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HDCD NV HDCD 1 bits 

The source material is mastered in HDCD format if HDCD = 1, and otherwise HDCD = 0. 

Extension Audio Descriptor Flag ACC EXT_AUDIO_ID 3 bits 

This flag has meaning only if the EXT_AUDIO = 1 (see below) and then it indicates the type of data that has been 
placed in the extension stream(s). 

Table 5.13: Extension Audio Descriptor Flag 



EXT AUDIO ID 


Type of Extension Data 





Channel Extension (XCh) 


1 


Reserved 


2 


Frequency Extension (X96l<) 


3 


XCh and X96k 


4 


Reserved 


5 


Reserved 


6 


Reserved 


7 


Reserved 



Extended Coding Flag 



ACC 



EXT AUDIO 



Ibit 



It indicates if extended audio coding data are present after the core audio data. Extended audio data will include the data 
for the extended bands of the 5 normal primary channels as well as all bands of additional audio channels. To simplify 
the process of implementing a 5,lch/48 kHz decoder, the extended coding data arrays are placed at the end of the core 
audio array. 

Table 5.14: Extended Coding Flag 



EXT AUDIO 


Extended Audio Data 





not present 


1 


present 



Audio Sync Word Insertion Flag 



ACC 



ASPF 



Ibit 



It indicates how often the audio data check word DSYNC (OxFFFF Extension Audio Descriptor Flag) occurs in the data 
stream. DSYNC is used as a simple means of detecting the presence of bit errors in the bit stream and is used as the 
final data verification stage prior to transmitting the reconstructed PCM words to the DACs. 

Table 5.15: Audio Sync Word Insertion Flag 



ASPF 


DSYNC Placed at End of Each 





Sub-frame 


1 


Sub-sub-frame 



Low Frequency Effects Flag 



LFF 



2 bits 



Indicates if the LFE channel is present and the choice of the interpolation factor to reconstruct the LFE channel (see 
table 5.16). 

Table 5.16: Flag for LFE channel 



LFF 


LFE Channel 


Interpolation Factor 





not present 




1 


Present 


128 


2 


Present 


64 


3 


Invalid 
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Predictor History Flag Switch 



HFLAG 



Ibit 



If frames are to be used as possible entry points into the data stream or as audio sequence\start frames" the AD PCM 
predictor history may not be contiguous. Hence these frames can be coded without the previous frame predictor history, 
making audio ramp-up faster on entry. When generating ADPCM predictions for current frame, the decoder will use 
reconstruction history of the previous frame if HFLAG = 1 . Otherwise, the history will be ignored. 



Header CRC Check Bytes 



HCRC 



16 bits 



This 16-bit CRC check word checks if there are errors from beginning of the current frame up to this point. It is present 
onlyifCPF= 1. 



Multirate Interpolator Switch 



NV 



FILTS 



Ibit 



This flag indicates which set of 32-band interpolation FIR coefficients is to be used to reconstruct the sub-band audio 

(see table 5.17). 

Table 5.17: Multirate interpolation filter bank switch 



FILTS 


32-band Interpolation Filter 





Non-perfect Reconstruction 


1 


Perfect Reconstruction 



Encoder Software Revision 



ACC/NV VERNUM 



4 bits 



It indicates of the revision status of the encoder software (see table 5.18). In addition the VERNUM is used to indicate 
the presence of the dialog normalization parameters (see table 5.22). 

Table 5.18: Encoder software revision 



VERNUM 


Encoder Software Revision 


0to6 


Future revision (compatible with the present document) 


7 


Current 


8 to 15 


Future revision (incompatible with the present document) 



NOTE: If the decoder encounters the DTS stream with the VERNUM >7 and the decoder is not designed for that 
specific encoder software revision than it must mute its outputs. 



Copy History 



NV 



CHIST 



2 bits 



It indicates the copy history of the audio. Because of the copyright regulations, the exact definition of this field is 
deliberately omitted. 



Source PCM Resolution 



ACC/NV PCMR 



3 bits 



It indicates the quantization resolution of source PCM samples (see table 5.19). The left and right surrounding channels 
of the source material are mastered in DTS ES format if ES = 1, and otherwise if ES = 0. 

Table 5.19: Quantization resolution of source PCM samples 



PCMR 


Source PCM Resolution 


ES 


ObOOO 


16 bits 





Ob001 


16 bits 


1 


Ob010 


20 bits 





Ob011 


20 bits 


1 


Ob110 


24 bits 





Ob101 


24 bits 


1 


Others 


Invalid 


invalid 
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Front Sum/Difference Flag 



SUMF 



Ibit 



Indicates if front left and right channels are sum-difference encoded prior to encoding (see table 5.20). If set to zero no 
decoding post processing is required at the decoder. 

Table 5.20: Sum/difference decoding status of front left and right channels 



SUMF 


Front Sum/Difference Encoding 





L = L, R = R 


1 


L = L + R, R = L-R 



Surrounds Sum/Difference Flag 



SUMS 



Ibit 



Indicates if left and right surround channels are sum-difference encoded prior to encoding (see table 5.21). If set to zero 
no decoding post processing is required at the decoder. 

Table 5.21 : Sum/difference decoding status of left and right surround channels 



SUMS 


Surround Sum/Difference Encoding 





Ls = Ls, Rs = Rs 


1 


Ls = Ls + Rs, Rs = Ls - Rs 



Dialog Normalization Parameter/Unspecified 



DIALNORM/UNSPEC 4 bits 



For the values of VERNUM = 6 or 7 this 4-bit field is used to determine the dialog normalization parameter. For all 
other values of the VERNUM this field is a place holder that is not specified at this time. 

The dialog normalization gain (DNG), in dB, is specified by the encoder operator and is used to directly scale the 
decoder outputs samples. In the DTS stream the information about the DNG value is transmitted by means of combined 
data in the VERNUM and DIALNORM fields (see table 5.22). 

For all other values of the VERNUM (i.e. 0, 1, 2, 3, 4, 5, 8, 9, . . . 15) the UNSPEC 4-bit field should be extracted but 
ignored by the decoder. In addition, for these VERNUM values, the dialog normalization gain should be set to i.e., 
DNG = -> No Dialog Normahzation. 

Table 5.22: Dialog Normalization Parameter 



Dialog Normalization Gain (DNG) 

Applied to the Decoder Outputs 

[dB] 


VERNUM 


DIALNORM 





7 


ObOOOO 


-1 


7 


ObOOOl 


-2 


7 


ObOOlO 


-3 


7 


ObOOII 


-4 


7 


ObOlOO 


-5 


7 


ObOIOI 


-6 


7 


ObOIIO 


-7 


7 


oboni 


-8 


7 


Obi 000 


-9 


7 


Obi 001 


-10 


7 


OblOlO 


-11 


7 


Obion 


-12 


7 


Obi 100 


-13 


7 


Obi 101 


-14 


7 


OblllO 


-15 


7 


Obi 111 


-16 


6 


ObOOOO 


-17 


6 


ObOOOl 


-18 


6 


ObOOlO 


-19 


6 


oboon 


-20 


6 


ObOlOO 


-21 


6 


ObOIOI 


-22 


6 


ObOIIO 
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Dialog Normalization Gain (DNG) 

Applied to the Decoder Outputs 

[dB] 


VERNUM 


DIALNORM 


-23 


6 


obom 


-24 


6 


OblOOO 


-25 


6 


Ob1001 


-26 


6 


Ob1010 


-27 


6 


Ob1011 


-28 


6 


ObllOO 


-29 


6 


Ob1101 


-30 


6 


Ob1110 


-31 


6 


Ob1111 



6 Extension to more than 5.1 channels (XCh) 

When the need arises to encode more than 5.1 channels, the extended channels are compressed using exactly the same 
technology as the core audio channels. The audio data representing these extension channels are appended to the end of 
the DTS stream audio. These extension audio data are automatically ignored by the first generation DTS decoders but 
can be decoded by the second generation DTS decoders. The decoding process flows as follows. 



6.1 Synchronization 

Channel Extension Sync Word V 



XChSYNC 



32 bits 



The synchronization word XChSYNC = Ox5a5a5a5a for the channel extension audio comes after all other extension 
streams i.e., in case of multiple extension streams the XCh stream is always the last . For 16 bit streams, XChSYNC is 
aligned to 32-bit word boundary. For 14 bit streams, it is aligned to both 32 bit and 28 bit word boundaries, meaning 
that, the sync word appears as 0xl696e5a5 in the 28 bit stream and as Ox5a5a5a5a after this stream is packed into a 32 
bit stream. 

Since the pseudo sync word might appear in the bit stream, it is MANDATORY to check the distance between this sync 
and the end of the encoded bit stream. This distance in bytes should be equal to XChFSlZEH-1. The parameter 
XChFSIZE is described below. 

NOTE: For compatibility reasons with legacy bit streams the estimated distance in bytes is checked against both 
the XChFSlZE-nl as well as the XChFSIZE. The XCh synchronization is pronounced only if the distance 
matches either of these two values. 

6.2 Frame header 

Primary Frame Byte Size V XChFSIZE 10 bits 

(XChFSlZE-nl) is the distance in bytes from current extension sync word to the end of the current audio frame. Valid 
range for XChFSIZE: 95 to 1 023. Invalid range for XChFSIZE: to 94. 



Extension Channel Arrangement 



ACC 



AMODE 



4 bits 



Audio channel arrangement that describes the number of audio channels (CHS) and the audio playback arrangement. It 
is set to represent the number of extension channels for now. More detail will be added in the future. 
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7 Extension to sampling frequencies of up to 96 kHz 

and/or higher resolution (X96k) 

The generalized concept of core + 96 kHz-extension coding is illustrated in figure 7.1. To encode 96 kHz LPCM the 
input audio stream is fed to a 96 kHz to 48 kHz down sampler and the resulting 48 kHz signal is encoded using standard 
core encoder as in figure 7.1 A). Referring to figure 7.1 A): 

• In the "Preprocess Input Audio" block the original 96 kHz/24-bit LPCM audio is first delayed and next passed 
through the extension 64-band analysis filter bank. Signal "1" in this case consists of the extension sub-band 
samples @ 96 kHz/64. 

• The core data consists of the core audio codes in 32 sub-bands and the side information. In the "Reconstruct 
Core Audio Components" block the core audio codes are inverse quantized to produce the reconstructed core 
sub-band samples @ 48 kHz/32. These sub-band samples correspond to signal "2". 

• In the "Generate Residuals" block the reconstructed core sub-band samples are subtracted from the extension 
sub-band samples in the lower 32 sub-bands. The extension sub-band samples in the upper 32 bands remain 
unaltered. These residual sub-band samples in the 64 bands correspond to signal "3". 

• The ("Generate Extension Data" block processes the residual sub-band samples and generates the extension data 
that, along with the core data, is assembled in a packer to produce a core-nextension bit stream. 

In the 96 kHz decoder, figure 7. IB), the unpacker first separates the core-nextension stream into the core and extension 
data. The core sub-band decoder, in the "Reconstruct Core Audio Components" block, processes the core data and 
produces the reconstructed core sub-band samples (same as signal "2" generated in the encoder). Next in the 
"Reconstruct Residual Components" block, the extension sub-band decoder uses the extension data to generate the 
reconstructed residual sub-band samples in the 64 bands. In the "Recombine Core and Residual Components" block the 
core sub-band samples are added to the lower 32 bands of residual sub-band samples to produce the extension sub-band 
samples in the 64 bands. In the same block the synthesis 64-band filter bank processes the extension sub-band samples 
and generates the 96 kHz 24-bit LPCM audio. The combining of reconstructed residuals and core signals on the decoder 
side, figure 7. IB), is also done in sub-band domain. 
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B) 96 kHz Decoder 
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C) 48 kHz (Legacy) Decoder 

Figure 7.1 : The concept of Core+Extension coding methodology 
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When a 48 kHz-only (legacy) decoder is fed the core + extension bit stream, figure 7.1C), the extension data fields are 
ignored and only the core data is decoded. This results in 48 kHz core LPCM audio output. 



7.1 



DTS Core+96 kHz-Extension encoder 



The block diagram in figure 7.2 shows the main components of the encoding algorithm. The input digital audio signal 
with a sampling frequency up to 96 kHz and a word length up to 24 bits is processed in the core branch and extension 
branch. In the core branch input audio is low-pass filtered to reduce its bandwidth to below 24 kHz, and then decimated 
by a factor of two, resulting in a 48 kHz sampled audio signal. The purpose of this LPF decimation is to remove signal 
components that cannot be represented by the core algorithm. The down sampled audio signal is processed in a 32-band 
analysis cosine modulated filter bank that produces the core sub-band samples. The core bit allocation routine based on 
the energy contained in each of the sub-bands and configuration of the core encoder determines the desired quantization 
scheme for each of the sub-bands. The core sub-band encoder performs quantization and encoding after which the audio 
codes and side information are delivered to the packer. The packer assembles this data into a core bit stream. 
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Figure 7.2: The block diagram of DTS Core+Extension encoder 

In the extension branch the delayed version of input audio is processed in a 64-band analysis cosine modulated filter 
bank that produces the extension sub-band samples. Inverse quantization of the core audio codes produces the 
reconstructed core sub-band samples. Subtracting these samples from the extension sub-band samples in the lower 
32 bands generates the residual sub-band samples. The residual signals in the upper 32 sub-bands are unaltered 
extension sub-band samples in corresponding bands. The delay of input audio is such that reconstructed core sub-band 
samples and extension sub-band samples in the lower 32 bands are time-aligned before the residual signals are produced 
i.e.. 



Delay = Delay 



DecimationLPF 



+ Delay 



CoreQMF 



- Delay 



ExtensionQMF 



The extension bit allocation routine based on the energy of residuals in each of the sub-bands and configuration of the 
extension encoder determines the desired quantization scheme for each of 64 sub-bands. The residual samples in 
sub-bands are encoded using a multitude of adaptive prediction, scalar/vector quantization and/or Huffman coding to 
produce the residual codes and extension side information. The packer assembles this data into an extension bit stream. 
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7.2 



DTS Core+96 kHz Extension decoder 



On the decoder side core and extension parts of the encoded bit stream are fed to their respective sub-band decoders. 
The reconstructed core sub-band samples are added to the corresponding residual sub-band samples in lower 32 bands. 
The reconstructed residual sub-band samples in the upper 32 bands remain unaltered. Passing the resulting extension 
sub-band samples through the synthesis 64-band QMF filter bank produces the 96 kHz sampled PCM audio, figure 7.3 
shows the block diagram of the core-i-extension decoder. 
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Figure 7.3: The block diagram of DTS Core+Extension decoder 

In the case where the encoded bit stream does not contain the extension data, the decoder based on its hardware 
configuration uses: a) a 32-band QMF with core sub-band samples as inputs to synthesize the 48 kHz sampled PCM 
audio; b) a 64-band QMF with inputs being core sub-band samples in the lower 32 bands and "zero" samples in the 
upper 32 bands to synthesize the interpolated PCM audio sampled at 96 kHz. 

The existing DTS core decoders when receiving the core-nextension bit stream will extract and decode the core data to 
produce the 48 kHz sampled PCM audio. The decoder ignores the extension data by skipping the extraction until the 
next DTS synchronization word. 

7.3 Synchronization 

96 kHz Extension Sync Word SYNC96 V 32 bits 

The synchronization word SYNC96 = 0xlD95F262 for the 96 kHz extension data comes after the core audio data. Note 
that if a channel extension is present the X96k extension data is placed before the XCh extension data in the encoded bit 
stream. For 16-bit streams the sync word is aligned to 32-bit word boundary. In the case of 14-bit streams SYNC96 is 
aligned to both 32-bit and 28-bit word boundaries meaning that 28 MSB-s of the SYNC96 appear as 0x0765 1F26. 

To reduce the probability of false synchronization caused by the presence of pseudo sync words, it is imperative to 
check the distance between the detected sync word and the end of current frame (as indicated by FSIZE). This distance 
in bytes must match the value of FSIZE96 (see below). 
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After the decoder synchronization is established a flag nX96kPresent is set and the decoder output sampling frequency 
is selected as 

Pseudo Code: OutSamplingFreq = SFREQ 

if ( nX96kPresent) 

OutSamplingFreq = 2 x OutSamplingFreq 

Note that SFREQ corresponds to a sampling frequency of reconstructed audio in the core decoder. 



7.4 



X96k frame header 



96 kHz Extension Frame Byte Data Size FSIZE96 V 12 bits 

(FSIZE96 + 1) is the byte size of 96 kHz extension data plus any other extension data that appears in between FSIZE96 
and the end of current frame. Valid range for FSIZE96: 95 to 4 095; Invalid range: to 94. 

Revision Number REVNO ACC/NV 4 bits 

Revision number for the high frequency extension processing algorithm. 

Table 7.1 : X96k Algorithm Revision Number 



REVNO 


Frequency Extension Encoder Software Revision Number 





Reserved 


1 


Current 


2 to 7 


Future revision (compatible with the original Revl.O specification) 


8 to 15 


Future revision (incompatible with the original Revl.O specification) 



NOTE: If the decoder is not compatible with some algorithm revisions (REVNO >7) it must ignore the X96k 
extension stream and reconstruct the core encoded audio components up to 24/22,05 kHz. 
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